Statistics of multiple PWM occurrences in biopolymer sequences
نویسندگان
چکیده
منابع مشابه
On counting position weight matrix matches in a sequence, with application to discriminative motif finding
MOTIVATION AND RESULTS The position weight matrix (PWM) is a popular method to model transcription factor binding sites. A fundamental problem in cis-regulatory analysis is to "count" the occurrences of a PWM in a DNA sequence. We propose a novel probabilistic score to solve this problem of counting PWM occurrences. The proposed score has two important properties: (1) It gives appropriate weigh...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملFitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find...
متن کاملApproximation of word counts in Markov chains
In this talk, we give an overview about the diierent approximation results existing on the statistical distribution of word counts in a Markov chain. Results concerning the number of overlapping occurrences, the number of non-overlapping occurrences (renewals) and the declumped count will be presented. Counts of single words but also multiple words and word families are considered. We will see ...
متن کاملOverview of the PWMEnrich package
The main functionality of the package is Position Weight Matrix (PWM) enrichment analysis in a single sequence (e.g. enhancer of interest) or a set of sequences (e.g. set of ChIP-chip/seq peaks). Note that this is not the same as de-novo motif finding which discovers novel motifs, nor motif comparison which aligns motifs. The package is built upon Biostrings and offers high-level functions to s...
متن کامل